{ "cells": [ { "cell_type": "markdown", "id": "69ef9e14", "metadata": {}, "source": [ "## **Single Stage -- Paradigm 1**\n", "\n", "### Real Data 1. Movie Lens\n", "\n", "Movie Lens is a movie recommendation website that helps users to find movies and collect their ratings. The goal of the simulation studies in single stage causal effect learning is to infer on the causal effect of treating users 'Drama', versus the control movie genere 'Sci-Fi'. This serves as an offline evaluation of how well people like/dislike a specific movie genere versus the other, and hence provides us a general scope of which movie genere to recommend so as to maximize users' satisfaction.\n" ] }, { "cell_type": "markdown", "id": "Vx3GPf3t1Eo3", "metadata": { "id": "Vx3GPf3t1Eo3" }, "source": [ "#### Data Pre-processing" ] }, { "cell_type": "code", "execution_count": 1, "id": "21378417", "metadata": {}, "outputs": [], "source": [ "# import related packages\n", "import os\n", "import pickle\n", "import numpy as np\n", "\n", "from causaldm.learners.CPL4.CMAB import _env_realCMAB as _env\n", "data = _env.get_movielens()" ] }, { "cell_type": "code", "execution_count": 2, "id": "66804173", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['Individual', 'Xs', 'mean_ri', 'standardized_Xs'])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.keys()" ] }, { "cell_type": "code", "execution_count": 3, "id": "21dded94", "metadata": {}, "outputs": [], "source": [ "data_ML = data['Individual']" ] }, { "cell_type": "code", "execution_count": 4, "id": "5988cfc7", "metadata": {}, "outputs": [], "source": [ "userinfo_index = np.array([3,9,11,12,13,14])\n", "\n", "users_index = data_ML.keys()\n", "n = len(users_index) # the number of users\n", "movie_generes = ['Comedy', 'Drama', 'Action', 'Thriller', 'Sci-Fi']\n", "\n", "data_CEL = {}\n", " \n", "# initialize the final data we'll use in Causal Effect Learning\n", "for i in movie_generes:\n", " data_CEL[i] = None \n", "\n", "import pandas as pd\n", "for movie_genere in movie_generes:\n", " for user in users_index:\n", " data_CEL[movie_genere] = pd.concat([data_CEL[movie_genere] , data_ML[user][movie_genere]['complete']])\n" ] }, { "cell_type": "code", "execution_count": 5, "id": "a4b8fa79", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_idmovie_idratingageComedyDramaActionThrillerSci-Figender_Moccupation_academic/educatoroccupation_college/grad studentoccupation_executive/managerialoccupation_otheroccupation_technician/engineer
4220482355.04.025.01.00.00.00.00.01.00.01.00.00.00.0
14400482918.04.025.01.00.00.00.00.01.00.01.00.00.00.0
16752482791.04.025.01.00.00.00.00.01.00.01.00.00.00.0
20195482797.04.025.01.00.00.00.00.01.00.01.00.00.00.0
21689482321.03.025.01.00.00.00.00.01.00.01.00.00.00.0
................................................
3934635878.03299.03.025.01.00.00.00.00.00.00.00.00.01.00.0
3954105878.0892.05.025.01.00.00.00.00.00.00.00.00.01.00.0
3960585878.0574.01.025.01.00.00.00.00.00.00.00.00.01.00.0
3977945878.01812.05.025.01.00.00.00.00.00.00.00.00.01.00.0
4007195878.03830.01.025.01.00.00.00.00.00.00.00.00.01.00.0
\n", "

49563 rows × 15 columns

\n", "
" ], "text/plain": [ " user_id movie_id rating age Comedy Drama Action Thriller \\\n", "4220 48 2355.0 4.0 25.0 1.0 0.0 0.0 0.0 \n", "14400 48 2918.0 4.0 25.0 1.0 0.0 0.0 0.0 \n", "16752 48 2791.0 4.0 25.0 1.0 0.0 0.0 0.0 \n", "20195 48 2797.0 4.0 25.0 1.0 0.0 0.0 0.0 \n", "21689 48 2321.0 3.0 25.0 1.0 0.0 0.0 0.0 \n", "... ... ... ... ... ... ... ... ... \n", "393463 5878.0 3299.0 3.0 25.0 1.0 0.0 0.0 0.0 \n", "395410 5878.0 892.0 5.0 25.0 1.0 0.0 0.0 0.0 \n", "396058 5878.0 574.0 1.0 25.0 1.0 0.0 0.0 0.0 \n", "397794 5878.0 1812.0 5.0 25.0 1.0 0.0 0.0 0.0 \n", "400719 5878.0 3830.0 1.0 25.0 1.0 0.0 0.0 0.0 \n", "\n", " Sci-Fi gender_M occupation_academic/educator \\\n", "4220 0.0 1.0 0.0 \n", "14400 0.0 1.0 0.0 \n", "16752 0.0 1.0 0.0 \n", "20195 0.0 1.0 0.0 \n", "21689 0.0 1.0 0.0 \n", "... ... ... ... \n", "393463 0.0 0.0 0.0 \n", "395410 0.0 0.0 0.0 \n", "396058 0.0 0.0 0.0 \n", "397794 0.0 0.0 0.0 \n", "400719 0.0 0.0 0.0 \n", "\n", " occupation_college/grad student occupation_executive/managerial \\\n", "4220 1.0 0.0 \n", "14400 1.0 0.0 \n", "16752 1.0 0.0 \n", "20195 1.0 0.0 \n", "21689 1.0 0.0 \n", "... ... ... \n", "393463 0.0 0.0 \n", "395410 0.0 0.0 \n", "396058 0.0 0.0 \n", "397794 0.0 0.0 \n", "400719 0.0 0.0 \n", "\n", " occupation_other occupation_technician/engineer \n", "4220 0.0 0.0 \n", "14400 0.0 0.0 \n", "16752 0.0 0.0 \n", "20195 0.0 0.0 \n", "21689 0.0 0.0 \n", "... ... ... \n", "393463 1.0 0.0 \n", "395410 1.0 0.0 \n", "396058 1.0 0.0 \n", "397794 1.0 0.0 \n", "400719 1.0 0.0 \n", "\n", "[49563 rows x 15 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_CEL['Comedy']" ] }, { "cell_type": "code", "execution_count": 29, "id": "a10030cd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_idmovie_idratingageDramagender_Moccupation_academic/educatoroccupation_college/grad studentoccupation_executive/managerialoccupation_otheroccupation_technician/engineer
14481193.04.025.01.01.00.01.00.00.00.0
1105748919.04.025.01.01.00.01.00.00.00.0
2587148527.05.025.01.01.00.01.00.00.00.0
31166481721.04.025.01.01.00.01.00.00.00.0
4038348150.04.025.01.01.00.01.00.00.00.0
....................................
3034065878.03300.02.025.00.00.00.00.00.01.00.0
3202755878.01391.01.025.00.00.00.00.00.01.00.0
3320115878.0185.04.025.00.00.00.00.00.01.00.0
3822215878.02232.01.025.00.00.00.00.00.01.00.0
3972095878.0426.03.025.00.00.00.00.00.01.00.0
\n", "

65642 rows × 11 columns

\n", "
" ], "text/plain": [ " user_id movie_id rating age Drama gender_M \\\n", "14 48 1193.0 4.0 25.0 1.0 1.0 \n", "11057 48 919.0 4.0 25.0 1.0 1.0 \n", "25871 48 527.0 5.0 25.0 1.0 1.0 \n", "31166 48 1721.0 4.0 25.0 1.0 1.0 \n", "40383 48 150.0 4.0 25.0 1.0 1.0 \n", "... ... ... ... ... ... ... \n", "303406 5878.0 3300.0 2.0 25.0 0.0 0.0 \n", "320275 5878.0 1391.0 1.0 25.0 0.0 0.0 \n", "332011 5878.0 185.0 4.0 25.0 0.0 0.0 \n", "382221 5878.0 2232.0 1.0 25.0 0.0 0.0 \n", "397209 5878.0 426.0 3.0 25.0 0.0 0.0 \n", "\n", " occupation_academic/educator occupation_college/grad student \\\n", "14 0.0 1.0 \n", "11057 0.0 1.0 \n", "25871 0.0 1.0 \n", "31166 0.0 1.0 \n", "40383 0.0 1.0 \n", "... ... ... \n", "303406 0.0 0.0 \n", "320275 0.0 0.0 \n", "332011 0.0 0.0 \n", "382221 0.0 0.0 \n", "397209 0.0 0.0 \n", "\n", " occupation_executive/managerial occupation_other \\\n", "14 0.0 0.0 \n", "11057 0.0 0.0 \n", "25871 0.0 0.0 \n", "31166 0.0 0.0 \n", "40383 0.0 0.0 \n", "... ... ... \n", "303406 0.0 1.0 \n", "320275 0.0 1.0 \n", "332011 0.0 1.0 \n", "382221 0.0 1.0 \n", "397209 0.0 1.0 \n", "\n", " occupation_technician/engineer \n", "14 0.0 \n", "11057 0.0 \n", "25871 0.0 \n", "31166 0.0 \n", "40383 0.0 \n", "... ... \n", "303406 0.0 \n", "320275 0.0 \n", "332011 0.0 \n", "382221 0.0 \n", "397209 0.0 \n", "\n", "[65642 rows x 11 columns]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_CEL_all = pd.concat([data_CEL['Drama'], data_CEL['Sci-Fi']]) \n", "data_CEL_all = data_CEL_all.drop(columns=['Comedy', 'Action', 'Thriller', 'Sci-Fi'])\n", "#data_CEL_all.to_csv(\"/Users/alinaxu/Documents/CDM/CausalDM/causaldm/data/MovieLens_CEL.csv\")\n", "data_CEL_all" ] }, { "cell_type": "code", "execution_count": null, "id": "0299448e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "o0IBWCOH1Jip", "metadata": { "id": "o0IBWCOH1Jip" }, "source": [ "#### Final Movie Lens Data Selected for Causal Effect Learning (CEL)\n", "\n", "After pre-processing, the complete data contains 65,642 movie watching history of 175 individuals. We set treatment $A=1$ when the user choose a 'Drama', and $A=0$ if the movie belongs to 'Sci-Fi'. \n", "\n", "The processed data is saved in 'causaldm/data/MovieLens_CEL.csv' and will be directly used in later subsections." ] }, { "cell_type": "markdown", "id": "55907739", "metadata": { "id": "yfiI-lSZRuOP" }, "source": [ "### Real Data 2. Mimic3\n", "https://www.kaggle.com/datasets/asjad99/mimiciii\n", "\n", "\n", "\n", "Mimic3 is a large open-access anonymized single-center database which consists of comprehensive clinical data of 61,532 critical care admissions from 2001–2012 collected at a Boston teaching hospital. Dataset consists of 47 features (including demographics, vitals, and lab test results) on a cohort of sepsis patients who meet the sepsis-3 definition criteria.\n", "\n", "In causal effect learning, we try to estimate the treatment effect of conducting a specific intervention (e.g use of ventilator) to the patient, either given a particular patient’s characteristics and physiological information, or evaluate all patients treatment effect as a whole.\n", "\n", "The original Mimic3 data was loaded from mimic3_sepsis_data.csv. For illustration purpose, we selected several representative features for the following analysis. \n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "19e51435", "metadata": { "id": "Vx3GPf3t1Eo3" }, "source": [ "#### Data Pre-processing" ] }, { "cell_type": "code", "execution_count": 6, "id": "6bb9f17d", "metadata": { "id": "eRpP5k9MBtzO" }, "outputs": [], "source": [ "# import related packages\n", "import numpy as np\n", "import pandas as pd\n", "from matplotlib import pyplot as plt;\n", "from sklearn.linear_model import LinearRegression\n", "#from causaldm.data import mimic3_sepsis_data" ] }, { "cell_type": "code", "execution_count": 8, "id": "JhfJntzcVVy2", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 488 }, "executionInfo": { "elapsed": 2636, "status": "ok", "timestamp": 1676736771987, "user": { "displayName": "Yang Xu", "userId": "12270366590264264299" }, "user_tz": 300 }, "id": "JhfJntzcVVy2", "outputId": "a1da8377-d00b-4256-ede3-e35485c8e3d6" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
blocicustayidcharttimegenderageelixhauserre_admissiondied_in_hospdied_within_48h_of_out_timemortality_90d...input_totalinput_4hourlyoutput_totaloutput_4hourlycumulated_balanceSOFASIRSvaso_inputiv_inputreward
0137245486000017639.82643500001...6527.000050.013617.0520.0-7090.0000510.02.0-0.884898
11116898241400130766.06902861000...0.00000.00.00.00.00001200.00.00.383136
21125805732000112049.21730300000...0.00000.00.00.00.0000420.00.00.976040
31144264269300030946.97000020001...1300.00001300.0340.0160.0960.0000520.04.00.125000
41305707825200019793.58891260000...9552.000050.06830.0540.02722.0000620.02.00.457625
51337214122800024524.74741950111...10661.0483725.05746.0360.04915.0483400.04.01.049099
\n", "

6 rows × 62 columns

\n", "
" ], "text/plain": [ " bloc icustayid charttime gender age elixhauser \\\n", "0 1 3 7245486000 0 17639.826435 0 \n", "1 1 11 6898241400 1 30766.069028 6 \n", "2 1 12 5805732000 1 12049.217303 0 \n", "3 1 14 4264269300 0 30946.970000 2 \n", "4 1 30 5707825200 0 19793.588912 6 \n", "5 1 33 7214122800 0 24524.747419 5 \n", "\n", " re_admission died_in_hosp died_within_48h_of_out_time mortality_90d \\\n", "0 0 0 0 1 \n", "1 1 0 0 0 \n", "2 0 0 0 0 \n", "3 0 0 0 1 \n", "4 0 0 0 0 \n", "5 0 1 1 1 \n", "\n", " ... input_total input_4hourly output_total output_4hourly \\\n", "0 ... 6527.0000 50.0 13617.0 520.0 \n", "1 ... 0.0000 0.0 0.0 0.0 \n", "2 ... 0.0000 0.0 0.0 0.0 \n", "3 ... 1300.0000 1300.0 340.0 160.0 \n", "4 ... 9552.0000 50.0 6830.0 540.0 \n", "5 ... 10661.0483 725.0 5746.0 360.0 \n", "\n", " cumulated_balance SOFA SIRS vaso_input iv_input reward \n", "0 -7090.0000 5 1 0.0 2.0 -0.884898 \n", "1 0.0000 12 0 0.0 0.0 0.383136 \n", "2 0.0000 4 2 0.0 0.0 0.976040 \n", "3 960.0000 5 2 0.0 4.0 0.125000 \n", "4 2722.0000 6 2 0.0 2.0 0.457625 \n", "5 4915.0483 4 0 0.0 4.0 1.049099 \n", "\n", "[6 rows x 62 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get data\n", "mimic3_data = pd.read_csv(\"mimic3_sepsis_data.csv\")\n", "mimic3_data.head(6)" ] }, { "cell_type": "code", "execution_count": 9, "id": "Nxf-mXJmFrEl", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "executionInfo": { "elapsed": 285, "status": "ok", "timestamp": 1676737621928, "user": { "displayName": "Yang Xu", "userId": "12270366590264264299" }, "user_tz": 300 }, "id": "Nxf-mXJmFrEl", "outputId": "f84dd68e-1bcd-4092-ed7e-eb0ba69c6058" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
GlucosepaO2PaO2_FiO2iv_inputSOFAreward
084.00000084.000000168.0000002.05-0.884898
1122.00000059.444444198.1481480.0120.383136
2125.000000192.000000690.6474820.040.976040
3110.727273179.000000447.4999934.050.125000
4187.000000125.000000347.2222222.060.457625
.....................
4995121.375000136.787683206.0055473.04-1.965110
4996108.00000062.333333143.8461530.011-0.025000
4997106.000000258.500000923.2142860.070.402531
4998144.000000376.000000752.0000001.04-0.172130
4999113.000000108.000000269.9999964.05-0.025000
\n", "

5000 rows × 6 columns

\n", "
" ], "text/plain": [ " Glucose paO2 PaO2_FiO2 iv_input SOFA reward\n", "0 84.000000 84.000000 168.000000 2.0 5 -0.884898\n", "1 122.000000 59.444444 198.148148 0.0 12 0.383136\n", "2 125.000000 192.000000 690.647482 0.0 4 0.976040\n", "3 110.727273 179.000000 447.499993 4.0 5 0.125000\n", "4 187.000000 125.000000 347.222222 2.0 6 0.457625\n", "... ... ... ... ... ... ...\n", "4995 121.375000 136.787683 206.005547 3.0 4 -1.965110\n", "4996 108.000000 62.333333 143.846153 0.0 11 -0.025000\n", "4997 106.000000 258.500000 923.214286 0.0 7 0.402531\n", "4998 144.000000 376.000000 752.000000 1.0 4 -0.172130\n", "4999 113.000000 108.000000 269.999996 4.0 5 -0.025000\n", "\n", "[5000 rows x 6 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "selected = ['Glucose','paO2','PaO2_FiO2', 'iv_input', 'SOFA','reward']\n", "n = 5000\n", "mimic3_data_selected = mimic3_data[:n][selected]\n", "mimic3_data_selected" ] }, { "cell_type": "code", "execution_count": 27, "id": "J__3Ozs7Uxxs", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "executionInfo": { "elapsed": 324, "status": "ok", "timestamp": 1676738479989, "user": { "displayName": "Yang Xu", "userId": "12270366590264264299" }, "user_tz": 300 }, "id": "J__3Ozs7Uxxs", "outputId": "8fb59a44-73f8-4efb-99cb-f0cf05f202db" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
GlucosepaO2PaO2_FiO2iv_inputSOFAreward
01.01.0000001.0000001.011.000000
1122.059.444444198.1481480.0120.383136
2125.0192.000000690.6474820.040.976040
31.01.0000001.0000001.011.000000
41.01.0000001.0000001.011.000000
51.01.0000001.0000001.011.000000
\n", "
" ], "text/plain": [ " Glucose paO2 PaO2_FiO2 iv_input SOFA reward\n", "0 1.0 1.000000 1.000000 1.0 1 1.000000\n", "1 122.0 59.444444 198.148148 0.0 12 0.383136\n", "2 125.0 192.000000 690.647482 0.0 4 0.976040\n", "3 1.0 1.000000 1.000000 1.0 1 1.000000\n", "4 1.0 1.000000 1.000000 1.0 1 1.000000\n", "5 1.0 1.000000 1.000000 1.0 1 1.000000" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "userinfo_index = np.array([0,1,2,4]) # record all the indices of patients' information\n", "SandA = mimic3_data_selected.iloc[:, np.array([0,1,2,3,4])]\n", "\n", "data_CEL_selected = mimic3_data_selected\n", "data_CEL_selected.iloc[np.where(mimic3_data_selected['iv_input']!=0)[0],:] = 1\n", "# change the discrete action to binary\n", "data_CEL_selected.head(6)" ] }, { "cell_type": "markdown", "id": "11a26dc0", "metadata": { "id": "o0IBWCOH1Jip" }, "source": [ "#### Final Mimic3 Data Selected for Causal Effect Learning (CEL)\n", "\n", "After pre-processing, we selected 4 features as the state variable in CEL, which represents the baseline information of the patients:\n", "* **Glucose**: glucose values of patients\n", "* **paO2**: The partial pressure of oxygen\n", "* **PaO2_FiO2**: The partial pressure of oxygen (PaO2)/fraction of oxygen delivered (FIO2) ratio.\n", "* **SOFA**: Sepsis-related Organ Failure Assessment score to describe organ dysfunction/failure.\n", "\n", "The action variable is **iv-input**, which denotes the volumn of fluids that have been administered to the patient. Additionally, we set all non-zero iv-input values as $1$ to create a binary action space.\n", "\n", "The last column denotes the reward we evaluated according to the status of patients from several aspects.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "pVh5bd-_4I3x", "metadata": { "id": "pVh5bd-_4I3x" }, "outputs": [], "source": [] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }